Search CORE

101 research outputs found

Interviewing During a Tight Job Market

Author: Ives Zachary G
Luo Qiong
Publication venue: ScholarlyCommons
Publication date: 01/09/2002
Field of study

Various tips for interviewing for PhD graduates, seeking an academic position in a research university in Asia or North America are discussed. It is suggested that having the dissertation done before interviews gives a large degree of relief on one\u27s mind. It is found that to be practical about job research package and keep a close eye on applications increases the confidence level. It is also observed that the questions during the talk provides opportunity to clarify and strengthen the talk and show this ability during the interview

ScholarlyCommons@Penn

Hong Kong University of Science and Technology Institutional Repository

Integrating Ontologies and Relational Data

Author: Auer Sören
Ives Zachary G
Publication venue: ScholarlyCommons
Publication date: 01/11/2007
Field of study

In recent years, an increasing number of scientific and other domains have attempted to standardize their terminology and provide reasoning capabilities through ontologies, in order to facilitate data exchange. This has spurred research into Web-based languages, formalisms, and especially query systems based on ontologies. Yet we argue that DBMS techniques can be extended to provide many of the same capabilities, with benefits in scalability and performance. We present OWLDB, a lightweight and extensible approach for the integration of relational databases and description logic based ontologies. One of the key differences between relational databases and ontologies is the high degree of implicit information contained in ontologies. OWLDB integrates the two schemes by codifying ontologies\u27 implicit information using a set of sound and complete inference rules for SHOIN (the description logic behind OWL ontologies. These inference rules can be translated into queries on a relational DBMS instance, and the query results (representing inferences) can be added back to this database. Subsequently, database applications can make direct use of this inferred, previously implicit knowledge, e.g., in the annotation of biomedical databases. As our experimental comparison to a native description logic reasoner and a triple store shows, OWLDB provides significantly greater scalability and query capabilities, without sacrifcing performance with respect to inference

ScholarlyCommons@Penn

Reconciling while Tolerating Disagreement in Collaborative Data Sharing

Author: Ives Zachary G
Taylor Nicholas E
Publication venue: ScholarlyCommons
Publication date: 01/01/2006
Field of study

In many data sharing settings, such as within the biological and biomedical communities, global data consistency is not always attainable: different sites\u27 data may be dirty, uncertain, or even controversial. Collaborators are willing to share their data, and in many cases they also want to selectively import data from others - but must occasionally diverge when they disagree about uncertain or controversial facts or values. For this reason, traditional data sharing and data integration approaches are not applicable, since they require a globally \emph{consistent} data instance. Additionally, many of these approaches do not allow participants to make updates; if they do, concurrency control algorithms or inconsistency repair techniques must be used to ensure a consistent view of the data for all users. In this paper, we develop and present a fully decentralized model of collaborative data sharing, in which participants publish their data on an ad hoc basis and simultaneously reconcile updates with those published by others. Individual updates are associated with provenance information, and each participant accepts only updates with a sufficient authority ranking, meaning that each participant may have a different (though conceptually overlapping) data instance. We define a consistency semantics for database instances under this model of disagreement, present algorithms that perform reconciliation for distributed clusters of participants, and demonstrate their ability to handle typical update and conflict loads in settings involving the sharing of curated data

CiteSeerX

Crossref

ScholarlyCommons@Penn

Quantifying Eavesdropping Vulnerability in Sensor Networks

Author: Anand Madhukar
Ives Zachary G
Lee Insup
Publication venue: ScholarlyCommons
Publication date: 01/01/2005
Field of study

With respect to security, sensor networks have a number of considerations that separate them from traditional distributed systems. First, sensor devices are typically vulnerable to physical compromise. Second, they have significant power and processing constraints. Third, the most critical security issue is protecting the (statistically derived) aggregate output of the system, even if individual nodes may be compromised. We suggest that these considerations merit a rethinking of traditional security techniques: rather than depending on the resilience of cryptographic techniques, in this paper we develop new techniques to tolerate compromised nodes and to even mislead an adversary. We present our initial work on probabilistically quantifying the security of sensor network protocols, with respect to sensor data distributions and network topologies. Beginning with a taxonomy of attacks based on an adversary’s goals, we focus on how to evaluate the vulnerability of sensor network protocols to eavesdropping. Different topologies and aggregation functions provide different probabilistic guarantees about system security, and make different trade-offs in power and accuracy

CiteSeerX

Crossref

ScholarlyCommons@Penn

Ronciling Differences

Author: Green Todd J.
Ives Zachary G
Tannen Val
Publication venue: ScholarlyCommons
Publication date: 01/01/2011
Field of study

In this paper we study a problem motivated by the management of changes in databases. It turns out that several such change scenarios, e.g., the separately studied problems of view maintenance (propagation of data changes) and view adaptation (propagation of view definition changes) can be unified as instances of query reformulation using views provided that support for the relational difference operator exists in the context of query reformulation. Exact query reformulation using views in positive relational languages is well understood, and has a variety of applications in query optimization and data sharing. Unfortunately, most questions about queries become undecidable in the presence of difference (or negation), whether we use the foundational set semantics or the more practical bag semantics. We present a new way of managing this difficulty by defining a novel semantics, Z- relations, where tuples are annotated with positive or negative integers. Z-relations conveniently represent data, insertions, and deletions in a uniform way, and can apply deletions with the union operator (deletions are tuples with negative counts). We show that under Z-semantics relational algebra (R A) queries have a normal form consisting of a single difference of positive queries, and this leads to the decidability of their equivalence.We provide a sound and complete algorithm for reformulating R A queries, including queries with difference, over Z-relations. Additionally, we show how to support standard view maintenanc

ScholarlyCommons@Penn

REX: Recursive, Delta-Based Data-Centric Computation

Author: Guha Sudipto
Ives Zachary G.
Mihaylov Svilen R.
Publication venue
Publication date: 01/01/2012
Field of study

In today's Web and social network environments, query workloads include ad hoc and OLAP queries, as well as iterative algorithms that analyze data relationships (e.g., link analysis, clustering, learning). Modern DBMSs support ad hoc and OLAP queries, but most are not robust enough to scale to large clusters. Conversely, "cloud" platforms like MapReduce execute chains of batch tasks across clusters in a fault tolerant way, but have too much overhead to support ad hoc queries. Moreover, both classes of platform incur significant overhead in executing iterative data analysis algorithms. Most such iterative algorithms repeatedly refine portions of their answers, until some convergence criterion is reached. However, general cloud platforms typically must reprocess all data in each step. DBMSs that support recursive SQL are more efficient in that they propagate only the changes in each step -- but they still accumulate each iteration's state, even if it is no longer useful. User-defined functions are also typically harder to write for DBMSs than for cloud platforms. We seek to unify the strengths of both styles of platforms, with a focus on supporting iterative computations in which changes, in the form of deltas, are propagated from iteration to iteration, and state is efficiently updated in an extensible way. We present a programming model oriented around deltas, describe how we execute and optimize such programs in our REX runtime system, and validate that our platform also handles failures gracefully. We experimentally validate our techniques, and show speedups over the competing methods ranging from 2.5 to nearly 100 times.Comment: VLDB201

arXiv.org e-Print Archive

CiteSeerX

An XML Query Engine for Network-Bound Data

Author: Halevy Alon Y
Ives Zachary G
Weld Daniel S
Publication venue: ScholarlyCommons
Publication date: 01/01/2001
Field of study

XML has become the lingua franca for data exchange and integration across administrative and enterprise boundaries. Nearly all data providers are adding XML import or export capabilities, and standard XML Schemas and DTDs are being promoted for all types of data sharing. The ubiquity of XML has removed one of the major obstacles to integrating data from widely disparate sources –- namely, the heterogeneity of data formats. However, general-purpose integration of data across the wide area also requires a query processor that can query data sources on demand, receive streamed XML data from them, and combine and restructure the data into new XML output -- while providing good performance for both batch-oriented and ad-hoc, interactive queries. This is the goal of the Tukwila data integration system, the first system that focuses on network-bound, dynamic XML data sources. In contrast to previous approaches, which must read, parse, and often store entire XML objects before querying them, Tukwila can return query results even as the data is streaming into the system. Tukwila is built with a new system architecture that extends adaptive query processing and relational-engine techniques into the XML realm, as facilitated by a pair of operators that incrementally evaluate a query’s input path expressions as data is read. In this paper, we describe the Tukwila architecture and its novel aspects, and we experimentally demonstrate that Tukwila provides better overall query performance and faster initial answers than existing systems, and has excellent scalability

CiteSeerX

ScholarlyCommons@Penn

Piazza: Data Management Infrastructure for Semantic Web Applications

Author: Halevy Alon Y
Ives Zachary G
Mork Peter
Tatarinov Igor
Publication venue: ScholarlyCommons
Publication date: 20/05/2003
Field of study

The Semantic Web envisions a World Wide Web in which data is described with rich semantics and applications can pose complex queries. To this point, researchers have defined new languages for specifying meanings for concepts and developed techniques for reasoning about them, using RDF as the data model. To flourish, the Semantic Web needs to be able to accommodate the huge amounts of existing data and the applications operating on them. To achieve this, we are faced with two problems. First, most of the world\u27s data is available not in RDF but in XML; XML and the applications consuming it rely not only on the domain structure of the data, but also on its document structure. Hence, to provide interoperability between such sources, we must map between both their domain structures and their document structures. Second, data management practitioners often prefer to exchange data through local point-to-point data translations, rather than mapping to common mediated schemas or ontologies. This paper describes the Piazza system, which addresses these challenges. Piazza offers a language for mediating between data sources on the Semantic Web, which maps both the domain structure and document structure. Piazza also enables interoperation of XML data with RDF data that is accompanied by rich OWL ontologies. Mappings in Piazza are provided at a local scale between small sets of nodes, and our query answering algorithm is able to chain sets mappings together to obtain relevant data from across the Piazza network. We also describe an implemented scenario in Piazza and the lessons we learned from it

CiteSeerX

ScholarlyCommons@Penn

MOSAIC: Unified Platform for Dynamic Overlay Selection and Composition

Author: Ives Zachary G
Loo Boon Thau
Mao Yun
Smith Jonathan M
Publication venue: ScholarlyCommons
Publication date: 03/06/2008
Field of study

MOSAIC constructs new overlay networks with desired characteristics by composing existing overlays with subsets of those attributes. Thus, MOSAIC overcomes the problem of multiple network infrastructures that are partial solutions, while preserving deployability. Composition of control and/or data planes is possible in the system. MOSAIC overlays are specified in Mozlog, a declarative language that specifies overlay properties without binding them to a particular implementation or underlying network. This paper focuses on the runtime aspects of MOSAIC: how it enables interoperability between different overlay networks and how it implements switching between different overlay compositions, permitting dynamic compositions with both existing overlay networks and legacy applications. The system is validated experimentally using declarative overlay compositions concisely specified in Mozlog: an indirection overlay that supports mobility (i3), a resilient overlay (RON), and scalable lookups (Chord), all of which are combined to provide new functionality. MOSAIC provides the benefits of runtime composition to simultaneously deliver application-aware mobility, NAT traversal and reliability with low performance overhead, demonstrated by measurements on both a local cluster and PlanetLab

CiteSeerX

ScholarlyCommons@Penn

TAP: Time-Aware Provenance for Distributed Systems

Author: Ding Ling
Haeberlen Andreas
Ives Zachary G
Loo Boon Thau
Zhuo Wenchao
Publication venue: ScholarlyCommons
Publication date: 01/01/2011
Field of study

In this paper, we explore the use of provenance for analyzing execution dynamics in distributed systems. We argue that provenance could have significant practical benefits for system administrators, e.g., for reasoning about changes in a system’s state, diagnosing protocol misconfigurations, detecting intrusions, and pinpointing performance bottlenecks. However, to realize this vision, we must revisit several aspects of provenance management. As a first step, we present time-aware provenance (TAP), an enhanced provenance model that explicitly represents time, distributed state, and state changes. We outline our research agenda towards developing novel query processing, languages, and optimization techniques that can be used to efficiently and securely query time-aware provenance, even in the presence of transient state or untrusted nodes

CiteSeerX

ScholarlyCommons@Penn